A Framework for Transparent Execution of Massively-Parallel Applications on CUDA and OpenCL

نویسندگان

  • Jörn Teuber
  • Rene Weller
  • Gabriel Zachmann
چکیده

We present a novel framework for the simultaneous development for different massively parallel platforms. Currently, our framework supports CUDA and OpenCL but it can be easily adapted to other programming languages. The main idea is to provide an easy-to-use abstraction layer that encapsulates the calls of own parallel device code as well as library functions. With our framework the code has to be written only once and can then be used transparently for CUDA and OpenCL. The output is a single binary file and the application can decide during run-time which particular GPU-method it will use. This enables us to support new features of specific platforms while maintaining compatibility. We have applied our framework to a typical project using CUDA and ported it easily to OpenCL. Furthermore we present a comparison of the running times of the ported library on the different supported platforms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

7 A Scalable Software Framework for Stateful Stream Data Processing on Multiple GPUs and Applications

During the past few years the increase of computational power has been realized using more processors with multiple cores and specific processing units like Graphics Processing Units (GPUs). Also, the introduction of programming languages such as CUDA and OpenCL makes it easy, even for non-graphics programmers, to exploit the computational power of massively parallel processors available in cur...

متن کامل

Technical Report WM - CS - 2010 - 03 College of William & Mary Department of Computer Science WM - CS - 2010 - 03 Implementing the Dslash Operator in OpenCL

The Dslash operator is used in Lattice Quantum Chromodymamics (LQCD) applications to implement a Wilson-Dirac sparse matrix-vector product. Typically the Dslash operation has been implemented as a parallel program. Today’s Graphics Processing Units (GPU) are designed to do highly parallel numerical calculations for 3D graphics rendering. This design works well with scientific applications such ...

متن کامل

The Design and Implementation Ocelot’s Dynamic Binary Translator from PTX to Multi-Core x86

Ocelot is a dynamic compilation framework designed to map the explicitly parallel PTX execution model used by NVIDIA CUDA applications onto diverse many-core architectures. Ocelot includes a dynamic binary translator from PTX to many-core processors that leverages the LLVM code generator to target x86. The binary translator is able to execute CUDA applications without recompilation and Ocelot c...

متن کامل

On the Complexity of Robust Source-to-Source Translation from CUDA to OpenCL

The use of hardware accelerators in high-performance computing has grown increasingly prevalent, particularly due to the growth of graphics processing units (GPUs) as generalpurpose (GPGPU) accelerators. Much of this growth has been driven by NVIDIA’s CUDA ecosystem for developing GPGPU applications on NVIDIA hardware. However, with the increasing diversity of GPUs (including those from AMD, AR...

متن کامل

High-Level Programming of Stencil Computations on Multi-GPU Systems Using the SkelCL Library

The implementation of stencil computations on modern, massively parallel systems with GPUs and other accelerators currently relies on manually-tuned coding using low-level approaches like OpenCL and CUDA. This makes development of stencil applications a complex, time-consuming, and error-prone task. We describe how stencil computations can be programmed in our SkelCL approach that combines high...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015